---
subtitle: Evaluate prompts that include images
---

Opik lets you evaluate multimodal prompts that combine text and images. You can run these experiments straight from the UI, or by using the SDKs. This page covers both flows, clarifies which models support image inputs, and explains how to customise model detection.

## Online evaluation in the UI

LLM-as-a-Judge experiments in the Opik UI accept image attachments on both the dataset rows and the prompt messages. When you configure an evaluation:

1. Open **Evaluations → LLM-as-a-Judge** and click **New evaluation**.
2. Choose a vision-capable model (for example, `gpt-4o` or `claude-3-5-sonnet`).
3. Add image URLs or upload files in either the **Dataset** rows or the **Prompt** builder.
4. Launch the evaluation. Opik automatically keeps the images in the judge prompt when the selected model supports them. If the model does not support images, the UI surfaces a warning and flattens the image reference into a text placeholder so the run still completes.

All multimodal traces appear in the evaluation results, so you can inspect exactly what the judge model received.

## Using the SDKs

Both the Python and TypeScript SDKs accept OpenAI-style message payloads. Each message can contain either a string or a list of content blocks. Image blocks use the `image_url` type and can point to an `https://` URL or a `data:image/...;base64,` payload.

### Python example

```python
from opik import Opik
from opik.evaluation import evaluate_prompt, metrics

client = Opik()
dataset = client.get_or_create_dataset("vision_captions")
dataset.insert(
    [
        {
            "input": {
                "image_source": "https://example.com/cat.jpg",
            },
            "reference": "A grey cat sitting on a sofa",
        },
        {
            "input": {
                "image_source": "...",  # base64 works too
            },
            "reference": "An orange cat playing with a toy",
        },
    ]
)

MESSAGES = [
    {
        "role": "system",
        "content": "You are an assistant that analyses the attached image.",
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe the following picture."},
            {
                "type": "image_url",
                "image_url": {"url": "{{image_source}}", "detail": "high"},
            },
        ],
    },
]

evaluate_prompt(
    dataset=dataset,
    messages=MESSAGES,
    model="gpt-4o-mini",
    scoring_metrics=[metrics.Equals()],  # compares output against the dataset `reference`
)
```

The evaluator uses LiteLLM-style model identifiers. Opik recognises popular multimodal families (OpenAI GPT-4o, Anthropic Claude 3+, Google Gemini 1.5, Meta Llama 3.2 Vision, Mistral Pixtral, etc.) and treats any model whose name ends with `-vision` or `-vl` as vision-capable. Provider prefixes such as `anthropic/` are stripped automatically. When a model is not recognised as vision-capable, Opik logs a warning and replaces image blocks with placeholders before making the API call.

### TypeScript note

TypeScript support for multimodal evaluations is in progress. The TypeScript SDK will expose the same message structure and detection rules; we’ll update this section with a full example once the implementation lands.

## Customising model support

If you are experimenting with a new provider, you can extend the registry at runtime:

```python
from opik.evaluation.models import ModelCapabilities

ModelCapabilities.add_vision_model("my-provider/sparrow-vision-beta")
```

Any subsequent evaluations in that process will treat the custom model as vision-capable.

## FAQ

### How do I confirm whether a model supports images?

```python
from opik.evaluation.models import ModelCapabilities

ModelCapabilities.supports_vision("anthropic/claude-3-opus")
```

If the call returns `False`, Opik will log a warning and flatten image blocks. The data is inserted as text and truncated to the first 500 characters to keep prompts manageable.

### What kind of image sources can I use?

- Direct `https://` URLs (publicly accessible).
- Base64 data URLs such as `...`.
- Optional OpenAI `detail` fields (`"low"`, `"high"`) are preserved and forwarded when present.

### Does this work with LangChain integrations?

Yes. Opik forwards the same OpenAI-style content blocks that LangChain expects, so structured messages with `image_url` dictionaries continue to work. A simple validation script is shown below:

```python
from langchain_core.messages import messages_to_dict
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate
from opik.evaluation.models.langchain.message_converters import convert_to_langchain_messages

# Directly using LangChain message objects
plain_messages = [
    SystemMessage(content="You are an assistant."),
    HumanMessage(content="Describe the weather in Paris."),
]

convert_to_langchain_messages(messages_to_dict(plain_messages))

# Using a ChatPromptTemplate with multimodal content
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    (
        "user",
        [
            {"type": "text", "text": "Describe the following image."},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://python.langchain.com/img/phone_handoff.jpeg",
                    "detail": "high",
                },
            },
        ],
    ),
])

rendered = chat_prompt.invoke({})
messages = messages_to_dict(rendered.messages)

convert_to_langchain_messages(messages)  # round-trip validation
```
